Speech Enhancement Method Based on Modified Encoder-Decoder Pyramid Transformer
نویسندگان
چکیده
The development of new technologies for voice communication has led to the need improvement speech enhancement methods. Modern users information systems place high demands on both intelligibility signal and its perceptual quality. In this work we propose a approach solving problem enhancement. For this, modified pyramidal transformer neural network with an encoder-decoder structure was developed. encoder compressed spectrum into series internal embeddings. decoder self-attention transformations reconstructed mask complex ratio cleaned noisy signals based embeddings calculated by encoder. Two possible loss functions were considered training proposed model. It shown that use frequency encoding mixed input data improved performance approach. trained tested DNS Challenge 2021 dataset. showed compared modern We provide qualitative analysis process implemented network. gradually moved from simple noise masking in early epochs restoring missing formant components speaker's later epochs. This metrics subjective quality enhanced speech.
منابع مشابه
Speech enhancement based on the subspace method
A method of speech enhancement using microphonearray signal processing based on the subspace method is proposed and evaluated in this paper. The method consists of the following two stages corresponding to the different types of noise. In the first stage, less-directional ambient noise is reduced by eliminating the noise-dominant subspace. It is realized by weighting the eigenvalues of the spat...
متن کاملSpeech enhancement with weighted denoising auto-encoder
A novel speech enhancement method with Weighted Denoising Auto-encoder (WDA) is proposed in this paper. A weighted reconstruction loss function is introduced to the conventional Denoising Auto-encoder (DA), and makes it suitable for the task of speech enhancement. First, the proposed WDA is used to model the relationship between the noisy and clean power spectrums of speech signal. Then, the es...
متن کاملA Hierarchical Encoder-Decoder Model for Statistical Parametric Speech Synthesis
Current approaches to statistical parametric speech synthesis using Neural Networks generally require input at the same temporal resolution as the output, typically a frame every 5ms, or in some cases at waveform sampling rate. It is therefore necessary to fabricate highly-redundant frame-level (or samplelevel) linguistic features at the input. This paper proposes the use of a hierarchical enco...
متن کاملModified Spectral Subtraction Based Speech Enhancement
The one-microphone speech enhancement, (SE) algorithm as a modification of the extended spectral subtraction (SS) method (ESS) [1] is presented in this paper. The algorithm can be used for the reduction of additive stationary and quasi-stationary colored broad-band noise in noisy speech in hands-free communication terminals as well as in a number of other applications. The achieved noise reduct...
متن کاملA New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain
Quality of speech signal significantly reduces in the presence of environmental noise signals and leads to the imperfect performance of hearing aid devices, automatic speech recognition systems, and mobile phones. In this paper, the single channel speech enhancement of the corrupted signals by the additive noise signals is considered. A dictionary-based algorithm is proposed to train the speech...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Trudy Instituta sistemnogo programmirovaniâ
سال: 2022
ISSN: ['2079-8156', '2220-6426']
DOI: https://doi.org/10.15514/ispras-2022-34(4)-10